Ensuring the Integrity of Wikipedia: A Data Science Approach

نویسنده

  • Francesca Spezzano
چکیده

In this paper, we present our research on the problem of ensuring the integrity of Wikipedia, the world’s biggest free encyclopedia. As anyone can edit Wikipedia, many malicious users take advantage of this situation to make edits that compromise pages’ content quality. Specifically, we present DePP, the state-of-the-art tool that detects article pages to protect with an accuracy of 93% and we introduce our research on identifying spam users. We show that we are able to classify spammers from benign users with 80.8% of accuracy and 0.88 mean average precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Ensuring HIV Data Availability, Transparency and Integrity in the MENA Region; Comment on “Improving the Quality and Quantity of HIV Data in the Middle East and North Africa: Key Challenges and Ways Forward”

In this commentary, we elaborate on the main points that Karamouzian and colleagues have made about HIV data scarcity in Middle Eastern and North African (MENA) countries. Without accessible and reliable data, no epidemic can be managed effectively or efficiently. Clearly, increased investments are needed to bolster capabilities to capture and interpret HIV surveillance data. We believe that th...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

The New Relationship between Honest and Humble Ethical Leadership Approach and Organizational Performance

Background: The discussion of humble and integrity ethical leadership has been raised as one of the newest leadership theories in organizations over the last few years and have been considered as a necessity in the academic environment. Therefore, the purpose of this study is to examine the relationship between humble and integrity approaches of Ethical leadership with organizational performanc...

متن کامل

Linear matrix inequality approach for synchronization of chaotic fuzzy cellular neural networks with discrete and unbounded distributed delays based on sampled-data control

In this paper, linear matrix inequality (LMI) approach for synchronization of chaotic fuzzy cellular neural networks (FCNNs) with discrete and unbounded distributed delays based on sampled-data controlis investigated. Lyapunov-Krasovskii functional combining with the input delay approach as well as the free-weighting matrix approach are employed to derive several sufficient criteria in terms of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017